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Abstract 

Saprotrophy on plant biomass is a recently developed nutrition strategy for Trichoderma. However, the physiology and evolution of 
this new nutrition strategy is still elusive. We report the deep sequencing and analysis of the genome of Trichoderma longibrachiatum, 
an efficient cellulase producer. The 31 .7-Mb genome, smallest among the sequenced Trichoderma species, encodes fewer nutrition- 
related genes than saprotrophic T reesei {Tr), including glycoside hydrolases and nonribosomal peptide synthetase-polyketide 
synthase. Homology and phylogenetic analyses suggest that a large number of nutrition-related genes, including GH18 chitinases, 
P-1 ,3/1 ,6-glucanases, cellulolytic enzymes, and hemicellulolytic enzymes, were lost in the common ancestor of T longibrachiatum {Tl) 
and Tr. dA//d5(co) calculation indicates that all the nutrition-related genes analyzed are under purifying selection. Cellulolytic enzymes, 
the key enzymes for saprotrophy on plant biomass, are under stronger purifying selection pressure in 77 and 7rthan in mycoparasitic 
species, suggesting that development of the nutrition strategy of saprotrophy on plant biomass has increased the selection pressure. 
In addition, aspartic proteases, serine proteases, and metalloproteases are subject to stronger purifying selection pressure in Tl and 77", 
suggesting that these enzymes may also play important roles in the nutrition. This study provides insights into the physiology and 
evolution of the nutrition strategy of Trichoderma. 

Key words: Trichoderma longibrachiatum, cellulolytic enzymes, carbohydrate-active enzymes, proteases, purifying selection, 
dN/dS. 



Introduction 

Trichoderma (telemorph Hypocrea) species are highly inter- 
active in root, soil, and foliar environments and are among 
the most commonly isolated saprotrophic fungi (Harman 
et al. 2004; Druzhinina et al. 2011). Most Trichoderma 
can grow on both living fungi (mycoparasitism) and dead 
fungal substances (saprotrophy on fungal substances) and 
their nutrition strategy is referred to as mycotrophy 
(Druzhinina et al. 2011). Because of many preys are plant 
pathogenic fungi, some Trichoderma species, for example. 



Trichoderma atroviride (Ta) and T virens (Tv), are used as 
biocontrol reagents (Harman et al. 2004). Though mycotro- 
phy is considered as the ancestral and the major lifestyle for 
Trichoderma (Kubicek et al. 2011), it is also noted that, 
several recent taxa of the genus, which occupy terminal 
positions in the phylogenetic trees, seem to have shifted 
to new ecological niches (Druzhinina et al. 2011). For 
example, T reesei (Tr) specializes on colonizing dead 
wood, T longibrachiatum {Tl) can colonize immunocompro- 
mised humans, and some species are isolated as endophytes 
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(symptomless growth inside plant tissue) (Druzhinina et al. 
2011). 

Comparative genomics of 7a, Tv, and Tr suggests that, 
mycoparasitic species have a large set of mycoparasitism- 
related genes, including carbohydrate-active enzymes 
(CAZymes) and secondary metabolism-related genes 
(Kubicek et al. 201 1). In comparison, saprotrophic species Tr 
has a smaller set of CAZymes and secondary metabolism- 
related genes, consistent with its lower mycoparasitic ability. 
Phylogenetic analysis suggests that the mycotroph-related 
genes arose in the common ancestor of Trichoderma, which 
had the ancestral life style of mycotrophy, and some of these 
genes were subsequently lost in saprotrophic 7r(Kubicek et al. 
201 1 ). This conclusion is consistent with the hypothesis that Tr 
became an efficient saprotroph on dead wood by following 
wood-degrading fungi into their habitat (Rossman et al. 
1999). Currently, because the genome sequences for other 
Trichoderma species, especially those closely related to Tr, are 
unavailable, it is still unclear whether similar genome reduction 
occurs for other Trichoderma species. 

Trichoderma sp. SMF2, firstly published as T. l<oningii, is a 
biocontrol fungus that has a strong inhibitory ability against 
plant pathogenic fungi and Gram-positive bacteria (Song et al. 
2006). It was reclassified as T pseudol<oningii based on mo- 
lecular data and morphological data (Chen et al. 2009). The 
secondary metabolites, peptaibols, and the extracellular en- 
zymes secreted by SMF2 are thought to be important factors 
that contribute to the inhibitory ability against pathogens. 
Peptaibols (Song et al. 2007; Luo et al. 2010; Shi et al. 
2010; Su et al. 2012) and proteases (Chen et al. 2009) se- 
creted by SMF2 are used to study the biocontrol mechanism 
of Trichoderma in our laboratory. To gain insights into the 
physiology and the biocontrol mechanism, the genome of 
SMF2 was deeply sequenced and the nutrition-related genes 
including those of chitinases, p-1 ,3/1 ,6-glucanases, cellulolytic 
enzymes, hemicellulolytic enzymes, and proteases were sys- 
tematically annotated. Furthermore, SMF2 was reclassified as 
77 based on phylogenetic analysis with tef1, call, and chi18-5 
genes as molecular markers. 

Both 77 and Tr belong to the Longibrachiatum clade of 
Trichoderma (Druzhinina et al. 2012; Samuels et al. 2012). 
Members of this clade are best known as producers of cellu- 
lose hydrolyzing enzymes (for 7rand Tl), as cause of opportu- 
nistic infections of man and animals (for Tl and close relatives), 
and for their association with wet building materials (Samuels 
et al. 2012). To gain further insights into the genome evolu- 
tion and the genetic basis underlying the evolution of nutrition 
strategy, 77 genome was compared with that of Tr, Ta, and Tv. 
Our results show that, nutrition strategy is not only related to 
the number of nutrition-related genes but also affect the se- 
lection pressure on these genes. This study will improve our 
understanding of the physiology and evolution of Trichoderma 
nutrition style. 



Materials and Methods 

Fungal Strains and Cultivation Conditions 

For genome sequencing, 77 SMF2 was grown in 0.2% potato 
dextrose medium (Sigma, USA), with shaking at 120 rotations 
per minute for 72 h at 28 °C. 

Genome Sequencing and Assembly 

For genome sequencing, fungal mycelia were collected at 72 h 
from 0.2% potato dextrose medium. DNA was prepared 
using E.Z.N.A. Fungal DNA Mini Kit (OMEGA, USA) and was 
freeze-dried for genome sequencing. Using a combination of 
Roche 454 and lllumina Solexa technologies, Tl genome was 
sequenced to an average of 69-fold coverage. A shotgun 
library was sequenced with Roche 454, resulting 2,555,045 
reads (852,363,657 bp). Two paired-end libraries (200 bp and 
2kb) were sequenced using lllumina Solexa (read length, 
44 bp), producing 15,719,200 clean reads (691,644,800 bp, 
200 bp library) and 14,538,236 clean reads (639,682,384 bp, 
2kb library), respectively. The 454 reads and Solexa reads 
were assembled together using MIRA v3.4.0 (Chevreux 
et al. 1999), which resulted in an assembly of 
31,735,570bp in 365 large contigs (>1 kb). These large 
MIRA contigs were then assembled into the final assembly 
with the help of the paired-end information using SSPACE 
basic v2.0 (Boetzer et al. 201 1). 

Assembly sequences, gene coordinates, and annotation 
data for 77 are available through anonymous ftp (ftp://222. 
206.24.193, last accessed February 11, 2014). The genome 
assembly has been deposited at DDBJ/EMBI7GenBank under 
the accession (GenBank: ANBJOOOOOOOO). The version de- 
scribed in this article is the first version (GenBank: 
ANBJ01 000000). 

Genome Data for Comparative Analysis 

Trichoderma genome sequence files and gene coordinate files 
were downloaded from Department of Energy Joint Genome 
Institute (JGI) Genome Portal (http://genome.jgi.doe.gov/, last 
accessed February 11, 2014) for 7r (http://genome.jgi.doe.gov/ 
Trire2/rrire2. home. html, last accessed February 11, 2014), 
Tv (http://genome.jgi.doe.gov/rriviGv29_8_2/rriviGv29_8_2. 
home.html, last accessed February 11, 2014), and Ta (http:// 
genome.jgi.doe.gov/rriat2/rriat2.home.html, last accessed 
February 11, 2014). Annotations were downloaded from 
GenBank {Tr, /V\IL02000000; Tv, ABDF02000000; and Ta, 
ABDG02000000). Fusarium graminearum data were down- 
loaded from Fusarium Comparative Sequencing Project, 
Broad Institute of Harvard and Massachusetts Institute of 
Technology (http:/AAAAAA/.broadinstitute.org/, last accessed 
February 11, 2014). Neurospora crassa data were downloaded 
from N. crassa Sequencing Project, Broad Institute of Harvard 
and Massachusetts Institute of Technology (http://www.broad 
institute.org/, last accessed February 11, 2014). 



380 Genome Biol. Evol. 6(2):379-390. doi:10.1093/gbe/evu018 Advance Access publication January 29, 2014 



Evolution of Trichoderma Nutrition Style 



GBE 



Gene Prediction 

First, nnodels were predicted using the de novo predictor 
Fgenesh, version 2.6 (Salamov and Solovyev 2000) with 
paranneters trained for fungi FL/sar/c//T?/Pezizonnycotina. 
Then, regions without Fgenesh nnodels were searched against 
the gene sets of the three published Trichoderma genonnes, 
with BlastX. The top BlastX hits (identity above 50%) were 
used to predicted gene nnodels with the help of Genewise 
(Birney and Durbin 2000). Genes shorter than 100 bp were 
excluded. As a result, we obtained 9,409 nonredundant 
nnodels. 

Gene Annotation 

Protein sequences were searched against SwissProt database 
(http://www.ebi.ac.uk/uniprot/, last accessed February 11, 
2014) using BlastP, with E value < IE- 10, alignnnent identity 
> 35%, and alignnnent score > 60 as filters. In addition, 
length of alignnnent nnust be longer than half of the query 
length and target length, and the difference of the query and 
target length nnust be shorter than 25% of the shorter one. As 
a result, 2,981 proteins have a nnatch in SwissProt. The nnet- 
abolic pathways were annotated using KEGG (Kanehisa et al. 

2004) on KAAS server (http://www.genonne.jp/tools/kaas/, last 
accessed February 11, 2014), with Bidirectional-Best-Hit 
nnethod and fungal genonnes as reference. The protein do- 
nnains were annotated using Pfann database version 26.0 and 
progrann pfannscan. NRPS and PKS genes were identified 
using SMURF server (http://www.jcvi.org/snnurf/index.php, 
last accessed February 11, 2014) (Khaldi et al. 2010). 

The tRNA genes were predicted using tRNAscan-SE, version 
1.3 (Schattner et al. 2005). The rRNA genes were predicted 
using RNAnnnner 1.2 Server (http://www.cbs.dtu.dk/services/ 
RNAnnnner/, last accessed February 11, 2014) (Lagesen et al. 
2007). 

Reclassification of SMF2 Based on Molecular Phylogeny 

Two nnethods have been used to reclassify SMF2. Firstly, SMF2 
was reclassified using the server TrichOKEY v. 2.0 (http://isth. 
info/, last accessed February 11, 2014) (Druzhinina et al. 

2005) , which uses a connbination of several oligonucleotides 
allocated within the internal transcribed spacer 1 and 2 (ITS1 
and 2) sequences of the rDNA repeat to quickly identify 
Hypocrea/Trichoderma at the genus and species levels. With 
the ITS sequence of SMF2 (GenBank Accession FJ605099. 1 ) as 
input, SMF2 was classified as Tl or Hypocrea orientalis, both 
with high identification reliability as reported by the server. 

Then, we classify SMF2 using the tef1, call , and chil8-5 
genes (Druzhinina et al. 2012). The DNA sequences for the 
above genes for the species in the section Longibrachiatunn of 
Trichoderma were obtained fronn National Center for 
Biotechnology Infornnation (NCBI) GenBank based on the 
accessions listed in the study of Druzhinina et al. (2012). 
The corresponding gene sequences in SMF2 genonne were 



obtained by searching against the predicted gene sequences 
of SMF2 using BlastN. Then, the sequences for each gene 
were aligned with Muscle version 3.8.31 (Edgar 2004), sepa- 
rately. Each alignnnent was visually checked with the help of 
BioEdit (Hall 1999). To obtain a better phylogenetic tree, sev- 
eral short sequences were excluded fronn the alignnnent. 
Thirdly, the poorly aligned regions were rennoved fronn each 
alignnnent using Gblocks version 0.91 b (Castresana 2000) 
with default paranneters for nucleotide sequences. The se- 
quences for each strain were checked and only strains for 
which the tef1, chi18-5, and call genes were all included in 
the alignnnents were used to create a concatenated alignnnent 
of the three genes. Finally, the concatenated alignnnent (in- 
cluding 952 sites) for the three genes was used to construct 
phylogenetic tree using Metropolis-coupled Markov chain 
Monte Carlo sannpling with MrBayes version 3.2.2 (Ronquist 
et al. 2012). The GTR+l + r nucleotide substitution nnodel 
was used, and two sinnultaneous runs of four incrennentally 
heated chains were perfornned for 5 nnillions of generations. 
The accessions for the sequences used in this study were listed 
in supplennentary table SI 7, Supplennentary Material online. 

Orthologous Genes 

The orthologous gene fannilies in the four Trichoderma species 
were analyzed using BlastP and OrthoMCL (Li et al. 2003), 
with E-value cutoff 1E-5, identity cutoff 50% and index 
/=1.5. 

To construct a phylogenetic tree for the Trichoderma spp. 
based on the protein sequences, we first connputed the ortho- 
logous gene groups for four Trichoderma spp. and F. grami- 
nearum genonne and N. crassa genonne using OrthoMCL 
(Li et al. 2003), with E-value cutoff 1E-5, percentage identity 
cutoff 50% and index /= 1 .5. As a result, 5,145 honnologous 
groups with each genonne having one orthologous gene were 
obtained. Then, each group was aligned separately using 
Muscle version 3.8.31 (Edgar 2004), and the aligned se- 
quences of all the groups were concatenated and conserved 
blocks were obtained for phylogenetic analysis using Gblocks 
version 0.91 b (Castresana 2000) with default paranneters. The 
final alignnnent contained 2,143,124 sites and was used to 
construct a neighbor-joining tree using MEGA version 5.05 
(Tannura et al. 201 1), with JTT nnodel and 500 sets of boot- 
strap replications. 

Annotation of CAZynnes 

We firstly retrieved CAZynne sequences based on GenBank 
annotations of Tr, Ta, and TV genonnes and annotated Pfann 
donnains in these sequences. Then, the discovered Pfann do- 
nnains were checked against CAZy and those unannbiguously 
affiliated to a CAZy fannily were used to connpile a dictionary 
(supplennentary table S6, Supplennentary Material online) to 
further identify CAZynnes fronn the four Trichoderma ge- 
nonnes. Several additional Pfann donnains were also nnanually 
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incorporated into the dictionary to predict more CAZymes. 
Finally, the four Trichoderma genomes were annotated 
using Pfam and the sequences with a Pfam domain in the 
dictionary were classified as CAZy. 

Evolutionary Changes of Numbers of Nutrition-Related 
Genes 

Gene numbers of chitinases, p-1,3/1,6-glucanases, cellulolytic 
enzymes, and hemicellulolytic enzymes in the ancestral species 
and gains and losses of these genes along each lineage were 
estimated using the reconciled tree method (Goodman et al. 
1 979; Page and Charleston 1 997; Nam and Nei 2005; Niimura 
and Nei 2007). This method finds out the differences between 
a species tree and the phylogenetic tree of a gene family and 
then fits the gene tree into the species tree by modeling these 
differences as gene gains and losses parsimoniously (i.e., to 
find out the minimum number of gene duplications plus gene 
losses). For each type of enzymes, the sequences were firstly 
grouped based on the homologous gene families calculated 
using BlastP and OrthoMCL (Li et al. 2003). Sequences of each 
group were aligned using Muscle version 3.8.31 (Edgar 2004) 
and used to construct a neighbor-joining tree using MEGA 
version 5.05 with JTT model and 500 bootstrap replicates. 
The groups containing too distantly related sequences were 
further divided into subgroups by checking the length and 
bootstrap support for each branch of the neighbor-joining 
tree. The neighbor-joining tree was processed by the c pro- 
gram branchout, and the output file was then used to count 
the gene changes using the perl script mrcacount.pl (Niimura 
and Nei 2007). A bootstrap cutoff of 70% was used in the 
analysis. The phylogenetic tree in figure 1 was used as the 
species tree. Both program branchout and perl script mrca- 
count.pl were kindly provided by Yoshihito Niimura. The 
number of genes in each ancestral node and the gene gain 
and loss events along each branch were summarized over 
all the groups and subgroups to get the results present in 
figure 2. For chitinases, the BlastP and OrthoMCL analyses 
revealed four closely related homologs (one for Tr, two for 
7a, and one for Tv). These homologs do not contain a Pfam 
domain that was used to classify CAZymes, and therefore, 



T. longibrachiatum 
T reesei 
T virens 
T. atroviride 
F. graminearum 
N. crassa 



Fig. 1. — A consensus neighbor-joining tree for 77 and close relatives. 
The tree was created based on 2,143,124 sites in 5,145 orthologous pro- 
teins using JTT matrix and 500 bootstrap replications. Bootstrap percent- 
ages were shown on the branches. The bar represents 0.05 substitutions 
per site. 




they were not included in tables 1 and 2. In spite of this an- 
notation inconsistence, these homologs were used in estima- 
tion of gene gains and losses in chitinase evolution. 

Annotation of Proteases 

Proteases were annotated by searching against the peptidase 
database MEROPS Release 9.6 (http://merops.sanger.ac.uk/, 
last accessed February 11, 2014) (Rawlings et al. 2012) 
using BlastP, with identity cutoff 35%, E-value cutoff 1E-5, 
and score cutoff 30. Sequences whose best hit was a 
"nonpeptidase homolog" or "peptidase inhibitor" were 
discarded. 

Calculation of 6N and dS 

For the calculation of 6N and d5, each homology group of 
protein sequences were aligned using Muscle version 3.8.31 
(Edgar 2004). Gap-containing columns were removed from 
the amino acid alignment. Then the nucleotide sequence 
alignment for each homology group was created by getting 
the corresponding codon from the gene sequence for each 
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Fig. 2. — Gains and losses of genes for chitinases (A). P-1 ,3/1 ,6-gluca- 
nases {&), cellulolytic enzymes (0, and hemicellulolytic enzymes (D). 
Numbers in boxes indicate the numbers of genes in the extent and ances- 
tral species. Numbers with plus and minus signs on branches indicate gene 
gains and losses. For chitinases {A), gene gains and losses were estimated 
based on the sequences of chitinases and the closely related homologs (see 
Materials and Methods). Two gene numbers were presented for the 
extent species, with the left one indicating the numbers of chitinases 
plus the closely related homologs and the right one indicating the numbers 
of chitinases. 



Table 1 

Comparison of CAZymes of Trichoderma 



Species 


GH 


GT 


PL 


CE 


Total 


Tl 


165 


93 


4 


19 


281 


Tr 


174 


94 


4 


19 


291 


Ta 


218 


97 


8 


26 


349 


Tv 


233 


101 


7 


26 


367 
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Table 2 

Comparison of Major Nutrition-Related CAZymes of Trichoderma 

Species Fungal Cell Wall Polysaccharides-Degrading Enzymes Plant Cell Wall Polysaccharldes-Degrading Enzymes 



Chitinases p-1,3/1f6-Glucanases Cellulolytic Enzymes Hemicellulolytic Enzymes 

Tl 17 13 17 10 

Tr 19 13 17 10 

Ta 26 15 21 18 

Tv 31 18 24 18 



Table 3 

Comparison of Trichoderma Genomes 

Species Size^ (Mb) Coverage^ Gaps^ (Mb) Scaf. Num.^ GC%'' Gene Num.^ Gene Len.^ (bp) 

Tl 31.7 69x 0.01 185 54.0 9,409 1,654 

Tr 33.9 9.0x 0.05 89 52.8 9,143 1,793 

Ta 36.1 8.3x 0.1 50 49.7 11,865 1,747 

TV 39^0 8^lx 02 135 492 12,518 1,710 

^Data for Tr, Ta, and Tv were adopted from Kubicek et al. (2011). 

"^Data for Tr, Ta, and Tv were calculated based on sequence data obtained from DOE JGI (http://genome.jgi-psf.org/, last accessed February 11, 2014). 



residue in the annino acid alignnnent. Pair-wise nucleotide 
alignnnents were obtained by directly retrieving the sequences 
in the alignment of the honnology group. 6N, d5, and 6N/6S 
(co) values were calculated using KaKs_Calculator with MS 
nnodel (Zhang et al. 2006). The homology groups with 6N 
or d5 values >2 were excluded from the data since these 
too high substitution rates are probably poorly estimated. 
Those with P value > 0.001 were also excluded to get reliable 
results. 

Results 

Sequencing of SMF2 Genome 

SMF2 genome was sequenced to an average of 69-fold cov- 
erage using a combination of Roche 454 and lllumina Solexa 
technologies (supplementary table S1, Supplementary 
Material online). The final assembly contains 316 contigs in 
185 scaffolds with a total size of 31,747,380 bp (including 
13,491 A/'s, see supplementary table S2, Supplementary 
Material online, for details of current assembly of SMF2 
genome). The 31.7-Mb genome of SMF2 is the smallest 
among the four sequenced Trichoderma spp. (table 3) and 
similar to that of Tr (33.9 Mb). GC content of the assembly 
is 54.0%, the highest among the sequenced Trichoderma spp. 
(52.7% for Tr, 49.7% for 7a, and 49.2% for Tv). 

A combination of an ab initio gene predictor (e.g., Fgenesh 
[Salamov and Solovyev 2000]) and a homology-based gene 
predictor (e.g., Genewise (Birney and Durbin 2000) and 
Fgenesh-h [http://www.softberry.com, last accessed February 
1 1 , 201 4]) was used to predict protein-coding genes. The final 
gene set comprised 9,409 models, including 5,779 complete 
models (with both start codon and stop codon) plus 3,630 
partial models (without start codon and/or stop codon). This 



number is slightly larger than that of Tr (9,129), but much 
smaller than that of Ta (1 1 ,863) and Tv (1 2,427). This is con- 
sistent with the similar size of SMF2 and Tr genomes. 

A full list of annotation information was included in 
supplementary table S3, Supplementary Material online. 

Reclassification of SMF2 as T/ 

In this study, SMF2 was reclassified based on molecular data. 
Firstly, SMF2 was classified using the server TrichOKEY v. 2.0 
(http://isth.info/, last accessed February 11, 2014) (Druzhinina 
et al. 2005). With the internal transcribed spacer sequence 
(GenBank Accession FJ605099. 1 ) as input, SMF2 was classified 
as "Trichoderma longibrachiatum-Hypocrea orientalis." To 
further clarify the taxonomy of SMF2, the concatenated se- 
quence of tef1 (SMF2FGGW_1 07796), call (SMF2FGGW_ 
102813), and chi18-5 (SMF2FGGW_1 07557) genes were 
used to construct a Bayesian phylogenetic tree as described 
(Druzhinina et al. 2012). As shown in figure 3, the Bayesian 
phylogenetic tree included SMF2 and another 91 strains, 
representing 20 formally described species plus four phyloge- 
netic species and lone lineages within Longibrachiatum clade. 
All the species except T parareesei and T flagellatum are 
monophyletic, with the branches being supported by posterior 
probability >0.5. SMF2 is clustered within the branch of 
Tl with posterior probability of 1 . Taken the above results 
together, strain SMF2 is reclassified as 77. 

Protein Families Expanded and Shrunken in T/ 

Based on Pfam domain annotation, the protein families were 
compared for the four Trichoderma species (supplementary 
table S4, Supplementary Material online). A protein number 
difference of >2 or <-2 was used as the criteria of expansion 
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Fig. 3. — Bayesian phylogenetic tree based on the concatenated alignments of tefl, call, and chi18-5 genes. Posterior probability (>0.5) are shown as 
percentages. 
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or shrunk. 77 was conripared with the other three Trichoderma 
species Tr, Ta, and Tv, which revealed that 42, 22, and 9 fann- 
ilies were expanded and 44, 147, and 325 families were 
shrunken. Recently developed species Tl and Tr were also 
connpared with the early branched species Ta and Tv. 
Results showed that, connpared with Ta and Tv, 175 families 
were shrunken in 77 and 77; and no families were expanded 
(supplementary table S4, Supplementary Material online). The 
most shrunken families include proteins containing zinc finger 
(Zn_clus, PF00172), proteins containing ankyrin repeats 
(Ank_2, PF12796, Ank, PF00023), fungal-specific transcription 
factors (FungaLtrans, PF04082, Fungal_trans_2, PF11951), 
the major facilitator superfamily transporters (MFS_1, 
PF07690), sugar (and other) transporter (Sugarjr, PF00083), 
proteins containing NTPase domain (NACHT, PF05729), short 
chain dehydrogenases (adh_short, PF00106), proteins contain- 
ing alcohol dehydrogenase GroES-like domain (ADH_N, 
PF08240), zinc-binding dehydrogenases (ADH_zinc_N, 
PF00107), alpha/beta hydrolases (Abhydrolase_6, PF12697), 
subtilases (Peptidase_S8, PF00082), NmrA-like negative tran- 
scriptional regulators (NmrA, PF05368), heterokaryon incom- 
patibility proteins (HET, PF06985), phosphorylase superfamily 
proteins (PNP_UDP_1, PF01048), and proteins containing 
WD40 repeats (WD40, PF00400). 

Orthologous Genes in the Four Sequenced Trichoderma 
Genomes 

We used BlastP and OrthoMCL (Li et al. 2003) to analyze the 
orthologous genes among the four sequenced Trichoderma 
spp. Because the genome sequences of the four species are all 
incomplete, the gene numbers and the homology family num- 
bers are underestimated. However, the high percentage of 
core gene families suggests that most of the genes are in- 
cluded in the current assembly of genome sequence. 

Results showed that all the 42,828 genes from the 
four genomes are classified into 15,360 families (fig. 4; see 
supplementary table S5, Supplementary Material online, for a 
full list of the homology families). The four species have a large 
core gene set of 7,656 families. The core family number can 
be extended to 8,556 if the families absent from one species 
were considered. Out of the 7,656 core families, 7,41 1 
(96.8%) have equal number of homologous genes in each 
species (including 7,361 "1 :1 :1 :1 "-type, 44 "2:2:2:2"-type, 
four "3:3:3:3"-type, and two "4:4:4:4"-type families). There 
were 506 77-specific families, most of which were not anno- 
tated by searching against swiss-prot and KEGG. The anno- 
tated proteins include a zinc-type alcohol dehydrogenase-like 
protein, a family GH89 glycoside hydrolase (GH), and a 
number of peptidases, including a family M4 metallopepti- 
dases, a family S9 serine peptidase, a peptidase CIp {type 1}, 
a family M20A metallopeptidase, a family C26 cysteine 
peptidase, and a family SI 2 serine peptidase. It has been sug- 
gested that nitrate reductase may be helpful for Trichoderma 
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Fig. 4. — Venn diagrams of homologous genes in four Trichoderma 
spedes. Numbers in cirdes indicate the numbers of homologous protein 
families. 

species survive on nitrogen-derived decaying wood (Slot and 
Hibbett 2007). Annotation revealed a nitrate reductase 
(SMF2FGGW_1 03702) and a molybdopterin molybdotrans- 
ferase (SMF2FGGW_1 03701) in the 77-specific families. 
Genes encoding these two enzymes form a cluster in the 
gene genome, suggesting that this nitrate reductase may be 
functional. In addition to this nitrate reductase, Ti genome 
encodes another two nitrate reductases. One nitrate reduc- 
tase (SMF2FGGW_1 07092), which forms a cluster with a ni- 
trite reductase (SMF2FGGW_1 07093) and a nitrate/nitrite 
transporter (SMF2FGGW_1 07094), is present in all the four 
Trichoderma species, and the other (SMF2FGGW_1 07094) is 
present in Ta and Tv but not in Tr. The presence of these 
nitrate reductases suggests a high acquiring ability of 77. 

Pair wise comparison showed that, mycoparasitic species 
Ta and Tv share the largest number of groups (9,097). 77 
shares more families with Tv (8,316) and Tr (8,262), than 
with Ta (8,043), suggesting Ti has a closer relationship with 
7\/and 77" than with Ta. To clarify the phylogenetic position of 
Ti, a neighbor-joining tree was constructed based on the 
5,145 orthologous proteins (alignment including 2,143,124 
sites without gaps) present in all the four Trichoderma spp. 
and fungi Fusarium graminearum and Neurospora crassa ge- 
nomes (fig. 1 ). It was shown that 77 has the closest relationship 
to 7ramong the studied genomes. Considering 77and 7r share 
a large ratio of genes, it is likely that the loss of mycoparasit- 
ism-related genes may occur before the speciation of 77 
and Tr. 

Carbohydrate-Active Enzymes 

CAZymes play key roles in the degradation of plant cell wall 
polysaccharides and fungal cell wall polysaccharides by 
Trichoderma spp. (Martinez et al. 2008; Kubicek et al. 
2011). Here, we carefully annotated the CAZymes in 77 as 
well as in Tr, Ta, and Tv using Pfam database (see Materials 
and Methods for detail). The Pfam domains used in the 
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annotation and classification were listed in supplementary 
table S6, Supplementary Material online. Summary of anno- 
tation results was presented in table 1 . Details for CAZy fam- 
ilies were shown in supplementary tables S7-S11, 
Supplementary Material online. 

Though the numbers of annotated CAZymes for Tr, Tv, and 
Tv are different from that in previous study (Kubicek et al. 
2011) (supplementary table S7, Supplementary Material 
online), the same conclusion can be drawn that Tr has less 
GHs, glycosyltransferases (GT), polysaccharide lyases (PL), and 
carbohydrate esterases (CE) than 7a and Tv. Annotation of 77 
genome revealed that, compared with Tr, Tl has a smaller 
number of GH (165 vs. 174), a similar number of GT (94 vs. 
95), and the same numbers of PL and CE. Therefore, Tl has the 
smallest set of CAZymes among the studied Trichoderma 
species. 

One key process in mycoparasitism is the lysis of the prey's 
cell walls (Howell 2003; Harman et al. 2004; Lorito et al. 

2010) . Fungal cell wall is mostly composed of chitin, and 
therefore, chitinolytic enzymes are a key factor in the myco- 
parasitic attack (Harman et al. 2004; SeidI 2008). Previous 
study has shown that, consistent with weak mycoparasitic 
ability, saprotrophic species Tr has less GH18 chitinases than 
mycoparasitic species (Kubicek et al. 201 1). Pfam annotation 
came to the same conclusion (table 2; supplementary tables 
S12-S14, Supplementary Material online, for details of the 
nutrition-related CAZymes). In addition, annotation of 77 
genome showed that, 77 encodes less GH18 chitinases than 
7r(17 vs. 19). See supplementary figure SI, Supplementary 
Material online, for a phylogenetic tree of the GH18 chiti- 
nases. Previous study has shown that Tr has fewer GH75 
chitosanases than mycoparasitic species (Kubicek et al. 

201 1) . Annotation of 77 genome revealed the same number 
of chitosanase genes (8) as in Tr. In addition to chitin, the 
central core of the cell wall of almost all fungi contains (3- 
1,3/1,6-glucan (Latge 2007). Therefore, p-1,3-glucanases 
and p-1,6-glucanases were also annotated. Results showed 
that, 77 and 77" genome encode the same numbers of p-1,3- 
glucanases and p-1,6-glucanases (13 in all; table 2; supple- 
mentary table SI 2, Supplementary Material online). 

The ability of saprotrophy on plant cell wall polysaccharides 
depends on the production of cellulolytic enzymes and hemi- 
cellulolytic enzymes. Annotation revealed that Tl and 77" ge- 
nomes encode the same numbers of cellulolytic enzymes and 
hemicellulolytic enzymes (17 and 10; table 2; supplementary 
tables SI 3 and SI 4, Supplementary Material online). It was 
also noted that, 77 and Tr genomes encode fewer cellulolytic 
and hemicellulolytic enzymes than Tvand Ta. 

We further estimated the gene duplication and loss events 
for the chitinases, p-1,3/1,6-glucanases, cellulolytic enzymes, 
and hemicellulolytic enzymes during the speciation of 
Trichoderma using a parsimony method (see Materials and 
Methods). We created phylogenetic trees for all groups/sub- 
groups of chitinases (26 groups plus 6 subgroups). 



P-1 ,3/1 ,6-glucanases (1 6 groups plus 5 subgroups), cellulolytic 
enzymes (25 groups), and hemicellulolytic enzymes (17 
groups plus 6 subgroups). Phylogenetic tree in figure 1 was 
used as a species tree. By finding out the difference between 
the gene tree and the species tree and fitting the gene tree 
into the species tree, the number of genes in each ancestral 
node and the gene duplications and losses along each branch 
of the species tree were counted. 

As shown in figure 2A-D, there are 40 chitinase genes, 19 
P-1,3/1,6-glucanase genes, 27 cellulolytic enzyme genes, and 
24 hemicellulolytic enzyme genes in the most recent common 
ancestor (MRCA) of the four Trichoderma species, all of which 
are more than those in the current species. Gene loss events 
dominate the gene number fluctuation of all the four classes 
of enzymes during the evolution of the four species of 
Trichoderma. Only two chitinase genes (one in Ta and the 
other in Tv) were duplicated in the evolution of the four classes 
of nutrition-related enzymes. Therefore, for each class of en- 
zymes, the total number of lost genes in the evolution can be 
estimated from the number of genes in the extent species (i.e., 
by subtracting the number of genes in the extent species from 
the number of genes in the MRCA of four Trichoderma spe- 
cies). For both fungal cell wall polysaccharides-degrading en- 
zymes and plant cell wall polysaccharides-degrading enzymes, 
the most significant gene losses occurred in the MRCA of the 
77 and 77", where 26.3-42.5% of the total number of genes in 
the MRCA of four Trichoderma species were lost. 

Secondary Metabolism 

Secondary metabolites are probably related to the mycopar- 
asitism of Trichoderma spp. Genes related to secondary me- 
tabolism of 77 were predicted using SMURF (Khaldi et al. 
2010). For the convenience of comparison, 77", 7a, and 7V ge- 
nomes were also analyzed using SMURF. As shown in table 4, 
77 genome encodes the smallest number (22) of nonribosomal 
peptide synthetase (NRPS)-polyketide synthase (PKS) genes 
among the four sequenced Trichoderma genomes (27 for 
77-, 41 for 7a, and 57 for Tv). The fewer number of NRPS/ 
PKS genes in 77 than Tr suggests that secondary metabolism 
of 77 is probably simpler than that of Tr. In addition, annotation 
using Pfam databases indicated that the domain structure of 
NRPS is also different for different species, suggesting that 
there are large differences between the secondary metabolites 
of different Trichoderma (see supplementary table S3, 

Table 4 



Comparison of NRPS, PKS, and NRPS/PKS of Trichoderma 



Species 


NRPS 


PKS 


NRPS/PKS 


NRPS-Like 


PKS-Like 


Total 


Tl 


6 


9 


2 


4 


1 


22 


Tr 


8 


11 


2 


5 


1 


27 


Ta 


12 


18 


2 


8 


1 


41 


Tv 


19 


20 


2 


14 


1 


56 
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Supplementary Material online, for details of Pfam 
annotation). 

The longest gene in Tl genome, SMF2FGGW_1 05489 
(69,506 bp), was predicted to be a hybrid NRPS/PKS gene by 
SMURF. Like long NRPS genes in other Trichoderma spp. 
(Wiest et al. 2002), it is an intron-less gene, which contains 
only two introns. Pfam analysis suggested that the protein 
(23,045 aa) encoded by SMF2FGGW_1 05489 is responsible 
for synthesis of 20-aa peptaibols. The second longest gene 
in Tl genome is an NRPS gene of 43,447-bp long 
(SMF2FGGW_101095). It is also an intron-less gene, which 
contains only two introns. Pfam analysis suggested that it en- 
codes an NRPS for the synthesis of 12-aa peptaibols. The 
above results agree well with our previous studies that show 
that 77 can produce a large amount of 20-aa peptaibols and a 
small amount of 12-aa peptaibols (Song et al. 2006, 2007). 
Compared with 77, the longest NRPS encoded by the other 
three sequenced Trichoderma genomes are shorter, and ac- 
cordingly, lengths of the longest peptaibols synthesized by the 
other three Trichoderma spp. are shorter (18 aa for 7rand Tv 
and 19 aa for Ta). 

Proteases 

Proteases are important enzymes that may be related to cell 
wall degradation for both pathogenic fungi and pathogenic 
animals. Recent comparative transcriptomics of Tr, Tv, and Ta 
showed that expression of proteases is up-regulated during 
confrontations with a plant pathogenic fungus Rhizoctonia 
solani, indicating that proteases may play roles in the antago- 
nism against pathogenic fungi (Atanasova et al. 2013). 
However, proteases of Trichoderma have not been systemat- 
ically compared at the whole-genome scale. Here, we system- 
atically annotated protease genes in 77, Tv, Tv, and Ta 
genomes using MEROPS database (Rawlings et al. 2012). 
The results (table 5) showed that, Tl (238) and Tr (239) have 
a smaller set of proteases than Tv{3^8) and Ta (335). Serine 
proteases contribute to the majority (--80%) of the total dif- 
ference, whereas metalloproteases contribute to over 1 0% of 
the total difference. Based on the MEROPS classification, S08, 
S09, and S33 are the largest families of annotated serine pro- 
teases (see supplementary table SI 5, Supplementary Material 
online, for complete lists of proteases of different families). 
S08 family proteases, also known as subtilisin-like proteases, 
were found to play roles in the mycoparasitism of Trichoderma 
(Atanasova et al. 2013). Tv and Ta have more S08 proteases 
than 77and Tr. In addition, Tvand Ta also have more S09, SI 2, 
S33, and S53 proteases, suggesting that these families may 
also contribute to the mycoparasitism of Trichoderma. 

Purifying Selection on Nutrition-Related Genes 

77 is an efficient producer of cellulases and also a (potential) 
opportunistic human pathogen. Compared with the ancient 
nutrition strategy of mycotrophy, both the utilization of plant 



Table 5 



Comparison of Proteases of Trichoderma 



Species 


A 


C 


G 


M 


S 


T 


Total 


Tl 


15 


36 


4 


64 


98 


21 


238 


Tr 


14 


38 


4 


62 


101 


20 


239 


Ta 


18 


39 


6 


75 


176 


21 


335 


Tv 


16 


42 


4 


74 


162 


20 


318 



Note. — ^A, aspartic proteases; C, cysteine proteases; G, glutamic proteases; M, 
metalloproteases, S, serine proteases; T, threonine proteases. 



biomass and the utilization of nutrition from human are re- 
cently developed nutrition strategies. The similarity of Tl and Tr 
genome suggests that, the development of nutrition strategy 
of utilization of plant biomass seems to occur in their common 
ancestor. Furthermore, the above analyses suggest that the 
gene numbers for utilization of plant biomass are decreased in 
the common ancestor of Tl and Tr. Therefore, these genes are 
probably under stronger selection pressure in 77 and 7rthan in 
Ta and Tv. The ratio (co) of nonsynonymous substitutions per 
nonsynonymous site (dA/) to synonymous substitutions per 
synonymous site (d5) can be used as an indicator of positive 
selection (co > 1) and purifying selection (co< 1). For each ho- 
mologous group of nutrition-related genes from four species, 
we compared the co values for the pair Ta-Tv and the pair 77-7r 
to check whether there is difference in the selection pressure. 
We calculated co values for the nutrition-related enzymes, in- 
cluding chitinases, glucanases, cellulolytic enzymes, and hemi- 
cellulolytic enzymes (supplementary table SI 6, Supplementary 
Material online) and found that, all these enzymes have co 
values much lower than 1 in both 77-7r and Ta-Tv (fig. SA). 
Therefore, these genes are under purifying selection pressure 
rather than positive selection pressure. 

Cellulolytic enzymes have a mean co value of 0.103 (stan- 
dard deviation 0.051, median 0.096) in Ta-Tv and have a 
mean co value of 0.045 (standard deviation 0.026, median 
0.039) in Tl-Tr. Among the 17 cellulolytic enzymes present 
in all the four species, 16 have a lower co value in 77-77" than 
in Ta-Tv {f\g. SB). Statistical analysis indicated that the co values 
of cellulolytic enzymes in Tl-Tr are significantly lower than that 
in Ta-Tv {one sample ftest; null hypothesis: (£i{TI-Tr) - o){Ta- 
Tv) > 0; alternative hypothesis: o^{TI-Tr) - (£i{Ta-Tv) < 0; 
degree of freedom = 16; P= 0.0001). Therefore, consistent 
with above expectation, cellulolytic enzymes are subject to 
stronger purifying selection pressure in 7/-7rthan in Ta-Tv. 

The CO values of hemicellulolytic enzymes in Tl-Tr are smal- 
ler than that in Ta-Tv but without statistical significance (one 
sample t test, null hypothesis: (£i{TI-Tr) - o^iTa-Tv) > 0; alter- 
native hypothesis: co(77-7>") - co(7a-r\/) < 0; degree of free- 
dom =4; P= 0.096). For chitinases, the co values in 77-7r 
are not statistically different from that in Ta-Tv (one 
sample t test, null hypothesis: o^{TI-Tr) - (£i{Ta-Tv) > 0; alter- 
native hypothesis: o^{TI-Tr) - o^iTa-Tv) < 0; degree of free- 
dom =11; P= 0.265). The comparison of co values of 
mycoparasitism-related p-1,3/1,6-glucanases revealed that. 
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Fig. 5. — Comparison of selection pressure in 77-77- and 7a - Tv. (A) co 
values for GH18 chitinases (blue squares), p-1,3/1,6-glucanases (red cir- 
cles), cellulolytic enzymes (green up triangles), hemicellulolytic enzymes 
(cyan down triangles), and GH75 chitosanases (purple diamonds). (B) co 
values for GH (red circles), GT (blue squares), CE (green up triangles), and 
PL (cyan down triangles). (0 « values for aspartic proteases (blue squares), 
cysteine proteases (red circles), metalloproteases (green up triangles), 
serine proteases (cyan down triangles), and threonine proteases (purple 
diamonds). One data point for metalloproteases with co(7a-7V) 0.107 
and co(77-7r) 0.347 was omitted for clarity. Lower co values indicate stron- 
ger selection pressure. 

CD values of these enzymes in Tl-Tr are smaller than that in 
7a - Tv, though without statistical significance (one sample t 
test, null hypothesis: co(7/-7r) - co(7a - Ti/) > 0; alternative 
hypothesis: o^iTI-Tr) - co(7a - Tv) < 0; degree of free- 
dom =5; P= 0.059), indicating that these mycoparasitism- 
related p-1,3/1,6-glucanases probably play more important 
roles in 7/-rrthan expected. 

We also calculated co values for all the CAZymes (fig. SB). It 
was shown that, most enzymes have smaller co values in 77-7r 
than in mycoparasitic species, suggesting that the selection 
pressure on CAZymes are increased in Tl-Tr. Most enzymes 
have CO values < 0.1 in 7a - Tv. A small number of enzymes 
have relative high co values (> 0.1) in Ta-Tv, and their co 
values are dramatically decreased in 77-7r (smaller than half 
of that in Ta - Tv, fig. SB). It is also noted that, only about half 
of these enzymes belong to one of the above four classes of 
nutrition-related enzymes, suggesting that some other 
CAZymes are also important for the metabolism (probably 
nutrition-related) of Tl-Tr. Besides enzymes with decreased 
CO values, analyses also revealed enzymes with increased co 
values in 77-7r (fig. SB), suggesting lower selection pressure 
on these enzymes in Tl-Tr. 



We also analyzed the selection pressure on proteases 
(fig. 50- One sample Mest showed that, aspartic prote- 
ases (degree of freedom = 13; P= 0.0003), serine proteases 
(degree of freedom = 65; P= 0.0004), and metalloproteases 
(degree of freedom = 50; P= 0.0294) have decreased co 
values in 77-77- than in Ta - Tv, suggesting that these proteases 
are under stronger purifying selection pressure in 77-77- than 
7a - Tv. 



Discussion 

Saprotrophic species 77- is a model for the study of 
Trichoderma physiology. Comparative genomics showed 
that, 77- has a smaller genome than the mycoparasitic species 
7Vand 7a, suggesting that gene loss events have occurred in 
the ancestor of 77-. In this study, we sequenced the genome of 
77, a close relative of 77-. Homology analyses and phylogenetic 
analyses suggest that the gene loss events occurred in the 
common ancestor of Tl and Tr. In addition, it is noted that, 
Tl has a smaller number of mycoparasitism-related genes, in- 
cluding CAZymes and NRPS/PKS, than Tr, suggesting that ad- 
ditional gene loss events occurred in 77 after the divergence 
from 77-. 

The decrease of mycoparasitic ability can be affiliated to the 
decrease in the number of mycoparasitism-related genes. 
However, the development of the ability of saprotrophy on 
plant biomass seems not a result of acquiring additional genes 
for enzymes degrading plant biomass. 77- is an efficient pro- 
ducer of cellulases and hemicellulases and is used as the major 
industrial resource of these enzymes. 77 is also an efficient 
cellulase producer. However, comparison of cellulolytic en- 
zymes and hemicellulolytic enzymes indicates that the 
number of these genes did not expand but was decreased 
in r/-77-. The ability of saprotrophy on plant biomass and the 
high efficiency of cellulolytic enzymes and hemicellulolytic en- 
zymes production suggest that, these enzymes may have 
been optimized to improve the specific activities and/or ex- 
pression levels in 77-77-. 

Previous study has shown that several Trichoderma chiti- 
nase genes have codons under positive selection (Ihrmark 
et al. 2010). We calculated the co values for the homologous 
groups and found that, all the analyzed chitinases, p-1, 3/1,6- 
glucanases, cellulolytic enzymes, and hemicellulolytic enzymes 
have CO values smaller than 1, suggesting that these enzymes 
are under purifying selection pressure. Therefore, at the whole 
gene level, purifying selection dominates the evolution. 
Comparison of co values shows that the cellulolytic enzymes 
have lower co values in 77-7r than in 7a - Tv. In contrast, co 
values of chitinases in Tl-Tr are not statistically different from 
that in 7a - Tv. The above results indicate that, the nutrition 
strategy of saprotrophy on plant biomass imposes a strong 
selection pressure on cellulolytic enzymes. 
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Conclusions 

77 has a genonne of 31.7 Mb, snnallest annong the four se- 
quenced species. 77 has the closest relationship with 77" 
among the sequenced species. Gene loss events probably oc- 
curred in the common ancestor of 77 and Tr, resulting in the 
smaller genome size of 77 and Tr than that of Tv and 7a. 
Development of new nutrition style is not only related to the 
decrease of nutrition-related genes (especially for fungal cell 
wall polysaccharides-degrading enzymes) but also related to 
the increase of selection pressure on nutrition-related genes 
(especially for plant cell wall polysaccharides-degrading en- 
zymes). This study provides insights into the physiology and 
evolution of nutrition strategy of Trichoderma and is helpful 
for development of improved biocontrol strains and cellulases 
and hemicellulases-production strains. 
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Supplementary tables S1-S17 and figure S1 are available at 
Genome Biology and Evolution online (http://www.gbe. 
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Acknowledgments 

This work was supported by Hi-Tech Research and 
Development Program of China (201 1/\A090704), the 
National Natural Science Foundation of China (31270064, 
31025001, 81071804, and 81271896), Program of 
Shandong for Taishan Scholars (2009TS079), and 
Independent Innovation Foundation of Shandong University 
(201 1DX002 and 2012TB004). 

Literature Cited 

Atanasova L, et al. 2013. Comparative transcriptomics reveals different 
strategies of Trichoderma mycoparasitism. BMC Genomics 14:121. 

Birney E, Durbin R. 2000. Using GeneWise in the Drosophila annotation 
experiment. Genome Res. 10:547-548. 

Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 201 1 . Scaffolding 
pre-assembled contigs using SSPACE. Bioinformatics 27:578-579. 

Castresana J. 2000. Selection of conserved blocks from multiple align- 
ments for their use in phylogenetic analysis. Mol Biol Evol. 17: 
540-552. 

Chen LL, et al. 2009. Characterization and gene cloning of a novel serine 
protease with nematicidal activity from Trichoderma pseudokoningii 
SMF2. FEMS Microbiol Lett. 299:135-142. 

Chevreux B, Wetter T, Suhai S. 1999. Genome sequence assembly using 
trace signals and additional sequence information. German Conference 
on Bioinformatics, GCB '99; 1999 Oct 4-6; Hannover, Germany: 
Comput. Sci. Biol.: Proc. German Conference on Bioinformatics 
GCB'99 GCB. p. 45-56. 

Druzhinina IS, et al. 2005. An oligonucleotide barcode for species 
identification in Trichoderma and Hypocrea. Fungal Genet Biol. 42: 
813-828. 

Druzhinina IS, et al. 2011. Trichoderma: the genomics of opportunistic 
success. Nat Rev Microbiol. 9:749-759. 



Druzhinina IS, et al. 201 2. Molecular phylogeny and spedes delimitation in 
the section Longibrachiatum of Trichoderma. Fungal Genet Biol. 49: 
358-368. 

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu- 
racy and high throughput. Nudeic Acids Res. 32:1792-1797. 

Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G. 
1979. Fitting the gene lineage into its spedes lineage, a parsimony 
strategy illustrated by cladograms constructed from globin sequences. 
SystZool. 28:132-168. 

Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor 
and analysis program for Windows 95/98/NT. NucI Acids Symp Ser. 
41:95-98. 

Harman GE, Howell CR, Viterbo A, Chet I, Lorito M. 2004. Trichoderma 
species — opportunistic, avirulent plant symbionts. Nat Rev Microbiol. 
2:43-56. 

Howell CR. 2003. Mechanisms employed by Trichoderma spedes in the 

biological control of plant diseases: the history and evolution of current 

concepts. Plant Dis. 87:4-10. 
Ihrmark K, et al. 2010. Comparative molecular evolution of trichoderma 

chitinases in response to mycoparasitic interactions. Evol Bioinform 

Online. 6:1-26. 

Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. 2004. The KEGG 
resource for dedphering the genome. Nudeic Acids Res. 32: 
D277-D280. 

Khaldi N, et al. 2010. SMURF: genomic mapping of fungal secondary 
metabolite dusters. Fungal Genet Biol. 47:736-741. 

Kubicek CP, et al. 201 1. Comparative genome sequence analysis under- 
scores mycoparasitism as the ancestral life style of Trichoderma. 
Genome Biol. 12:R40. 

Lagesen K, et al. 2007. RNAmmer: consistent and rapid annotation of 
ribosomal RNA genes. Nucleic Acids Res. 35:3100-3108. 

Latge JP. 2007. The cell wall: a carbohydrate armour for the fungal cell. 
Mol Microbiol. 66:279-290. 

Li L, Stoeckert CJ Jr, Roos DS. 2003. OrthoMCL: identification of ortholog 
groups for eukaryotic genomes. Genome Res. 13:2178-2189. 

Lorito M, Woo SL, Harman GE, Monte E. 2010. Translational research on 
Trichoderma: from 'omics to the field. Annu Rev Phytopathol. 48: 
395^17. 

Luo Y, et al. 2010. Antimicrobial peptaibols induce defense responses and 

systemic resistance in tobacco against tobacco mosaic virus. FEMS 

Microbiol Lett. 313:120-126. 
Martinez D, et al. 2008. Genome sequencing and analysis of the biomass- 

degrading fungus Trichoderma reesei (syn. Hypocrea Jecorina). Nat 

Biotechnol. 26:553-560. 
Nam J, Nei M. 2005. Evolutionary change of the numbers of homeobox 

genes in bilateral animals. Mol Biol Evol. 22:2386-2394. 
Niimura Y, Nei M. 2007. Extensive gains and losses of olfactory receptor 

genes in mammalian evolution. PLoS One 2:e708. 
Page RD, Charleston MA. 1997. From gene to organismal phylogeny: 

reconciled trees and the gene tree/species tree problem. Mol 

Phylogenet Evol. 7:231-240. 
Rawlings ND, Barrett AJ, Bateman A. 2012. MEROPS: the database of 

proteolytic enzymes, their substrates and inhibitors. Nucleic Acids 

Res. 40:D343-D350. 
Ronquist F, et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic in- 
ference and model choice across a large model space. Syst Biol. 61: 

539-542. 

Rossman AY, Samuels GJ, Rogerson CT, Lowen R. 1999. Genera of 

Bionectriaceae, Hypocreaceae and Nectriaceae (Hypocreales, 

Ascomycetes). Stud Mycol. 42:1-83. 
Salamov AA, Solovyev W. 2000. Ab initio gene finding in Drosophila 

genomic DNA. Genome Res. 10:516-522. 
Samuels GJ, et al. 2012. The Longibrachiatum clade of Trichoderma: a 

revision with new spedes. Fungal Divers. 55:77-108. 



Genome Biol. Evol. 6(2):379-390. doi:10.1093/gbe/evu018 Advance Access publication January 29, 2014 



389 



Xie etal. 



GBE 



Schattner P, Brooks AN, Lowe TM. 2005. The tRNAscan-SE, snoscan and 

snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic 

Acids Res. 33:W686-W689. 
SeidI V. 2008. Chitinases of filamentous fungi: a large group of diverse 

proteins with multiple physiological functions. Fungal Biol Rev. 22: 

36-42. 

Shi M, et al. 2010. Antimicrobial peptaibols from Trichoderma pseudoko- 

ningii induce programmed cell death in plant fungal pathogens. 

Microbiology 158:166-175. 
Slot JC, Hibbett DS. 2007. Horizontal transfer of a nitrate assimilation gene 

cluster and ecological transitions in fungi: a phylogenetic study. PLoS 

One 2:e1097. 

Song XY, et al. 2006. Broad-spectrum antimicrobial activity and high sta- 
bility of Trichokonins from Trichoderma koningii SMF2 against plant 
pathogens. FEMS Microbiol Lett. 260:119-125. 

Song XY, et al. 2007. Solid-state fermentation for Trichokonins 
production from Trichoderma koningii SMF2 and preparative 



purification of Trichokonin VI by a simple protocol. J Biotechnol. 
131:209-215. 

Su HN, et al. 2012. Antimicrobial peptide trichokonin Vl-induced alter- 
ations in the morphological and nanomechanical properties of 
Bacillus subtilis. PLoS One 7:e45818. 

Tamura K, et al. 201 1. MEGA5: molecular evolutionary genetics analysis 
using maximum likelihood, evolutionary distance, and maximum par- 
simony methods. Mol Biol Evol. 28:2731-2739. 

Wiest A, et al. 2002. Identification of peptaibols from Trichoderma virens 
and cloning of a peptaibol synthetase. J Biol Chem. 277: 
20862-20868. 

Zhang Z, et al. 2006. KaKs_Calculator: calculating Ka and Ks through 
model selection and model averaging. Genomics Proteomics 
Bioinformatics. 4:259-263. 

Associate editor: Yoshihito Niimura 



390 Genome Biol. Evol. 6(2):379-390. doi:10.1093/gbe/evu018 Advance Access publication January 29, 2014 



